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Abstract 

Large  unstructured  audio  data  sets  have  become  ubiquitous  and 
present  a  challenge  for  organization  and  search.  One  logical 
approach  for  structuring  data  is  to  find  common  speakers  and 
link  occurrences  across  different  recordings.  Prior  approaches 
to  this  problem  have  focused  on  basic  methodology  for  the  link¬ 
ing  task.  In  this  paper,  we  introduce  a  novel  trainable  non- 
parametric  hashing  method  for  indexing  large  speaker  record¬ 
ing  data  sets.  This  approach  leads  to  tunable  computational 
complexity  methods  for  speaker  linking.  We  focus  on  a  scal¬ 
able  clustering  method  based  on  hashing — canopy-clustering. 
We  apply  this  method  to  a  large  corpus  of  speaker  recordings, 
demonstrate  performance  tradeoffs,  and  compare  to  other  hash¬ 
ing  methods. 

Index  Terms:  speaker  recognition,  clustering,  hashing,  locality 
sensitive  hashing. 


1.  Introduction 

We  assume  that  a  large  corpus  of  audio  recordings  with  re¬ 
occurring  speakers  is  given.  Our  goal  is  to  structure  this  corpus 
in  two  ways.  First,  we  want  to  explore  methods  for  quickly  per¬ 
forming  speaker  query-by-example  (QBE).  I.e.,  given  a  record¬ 
ing  from  a  speaker,  find  all  recordings  by  the  same  speaker  in 
our  corpus.  Second,  given  a  QBE  method,  how  can  we  perform 
speaker  clustering — each  clustering  should  be  a  single  speaker, 
and  a  cluster  should  contain  all  recordings  from  that  speaker  in 
the  corpus.  The  result  of  these  two  steps  is  a  structured  organi¬ 
zation  of  the  corpus  by  speaker — we  can  quickly  find  the  same 
speaker  in  multiple  recordings. 

Two  critical  tools  for  speaker  linking  in  large  corpora  are 
speaker  diarization  and  speaker  embedding.  Eor  the  first  part, 
since  we  want  to  focus  on  the  speaker  linking  aspect,  we  assume 
that  diarization  has  been  performed  and  each  recording  in  our 
corpus  contains  only  a  single  speaker.  Eor  speaker  embedding, 
we  convert  our  recordings  to  a  (single)  speaker  vector.  This 
process  can  be  performed  with  many  approaches  [1,2].  Eor  this 
paper,  we  focus  on  using  i-vectors  [2],  but  the  methods  apply  to 
any  embedding. 

Eor  the  task  of  speaker  QBE  and  recognition,  multiple 
methods  have  been  proposed.  Eirst,  the  most  straightforward 
method  [3]  is — given  a  query  vector,  Xg,  perform  inner  prod¬ 
ucts  with  all  vectors  in  the  corpus,  X  =  {xi, . . . ,  Xn}.  This 
approach  is  0(n)  computation  and  0{n)  storage,  so  it  grows 
linearly  with  the  corpus  size.  A  second  approach,  graph-based 
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query  by  example  uses  a  graph  structure  along  with  random 
walks  to  perform  retrieval  [4].  This  approach  is  computationally 
0(1)  for  retrieval,  and  requires  0(n)  for  storage.  A  drawback 
of  the  method  is  that  it  requires  an  O(n^)  computation  for  set¬ 
ting  up  the  retrieval  data  structure.  A  third  approach  for  QBE 
is  to  use  (locality- sensitive)  hash  function  methods.  Speaker 
vectors  are  first  converted  to  binary  form  via  a  hash  function, 
and  then  retrieval  is  performed  using  standard  computer  sci¬ 
ence  hash  table  methods.  Multiple  authors  have  explored  this 
approach  [5,  6,  7].  The  advantage  of  this  approach  is  that  stor¬ 
age  is  0(n)  and  average  retrieval  complexity  is  0(1). 

For  this  paper,  we  use  the  hash  function  approach  to  speaker 
QBE.  Prior  approaches  have  used  locality  sensitive  hashing 
via  random  projection  [5,  6].  In  this  paper,  we  look  at  non- 
parametric  trainable  methods  to  hashing.  The  goal  is  to  create 
data-specific  hashing  functions  with  better  performance.  Non- 
parametric  methods  offer  the  advantage  of  being  a  consistent 
estimator  of  the  underlying  probability  density  function  of  the 
data  without  any  prior  knowledge  [8,  9].  Note  that  for  this  pa¬ 
per,  we  focus  on  speaker  QBE  for  speaker  clustering  by  using 
hash  functions  only  and  no  graph  structure.  But,  speaker  QBE 
with  hash  functions  can  be  combined  with  random  walks  by  re¬ 
trieving  a  local  k-hop  graph  in  the  same  manner  as  [4]. 

Our  speaker  application  combines  speaker  QBE  with  link¬ 
ing  and  clustering.  Prior  methods  for  linking  and  cluster¬ 
ing  speakers  have  been  studied  in  multiple  contexts  including 
speaker  content  graphs  [10],  as  a  corpus  level  linking  prob¬ 
lem  [1 1],  as  a  diarization  problem  [12],  and  as  a  graph  clustering 
problem  [13].  The  basic  approach  is  to  find  speaker  links — i.e., 
are  two  recordings  from  the  same  speaker.  Putting  this  all  to¬ 
gether  gives  a  graph  where  nodes  are  recordings  and  weighted 
edges  represent  the  confidence  of  the  links.  Clustering  can  be 
considered  an  extension  of  this  process  where  the  links  must  be 
transitively  consistent.  I.e.,  given  that  recording  A  and  B  are  the 
same  speaker,  B  and  C  are  the  same  speaker,  then  A  and  C  are 
the  same  speaker.  Another  way  to  say  this  is  that  the  speaker 
content  graph  is  a  union  of  cliques  [14]. 

Our  approach  to  speaker  linking  and  clustering  consists  of 
three  parts.  First,  we  convert  all  recordings  in  the  data  set  to 
speaker  vectors  and  then  use  a  hash  function  to  convert  the  data 
to  a  binary  representation.  Second,  we  index  this  binary  repre¬ 
sentation  using  hash  tables.  Third,  we  use  multiple  retrievals  to 
calculate  a  subset  of  the  full  distance  matrix  to  reduce  computa¬ 
tion  by  a  tunable  amount.  This  approach  is  inspired  by  canopy 
clustering  [15].  The  resulting  speaker  method  can  be  tuned  both 
computationally  and  storage-wise  to  achieve  different  levels  of 
performance. 

The  outline  of  the  paper  is  as  follows.  In  Section  2,  we 
cover  speaker  QBE  using  hashing  methods.  We  review  the  stan¬ 
dard  random  projection  approaches  as  well  as  introducing  our 
non-parametric  approaches.  In  Section  3,  we  discuss  clustering 
and  detail  the  combination  of  hashing  and  canopy  clustering. 


Finally,  in  Section  4,  we  apply  our  methods  to  a  large  NIST 
speaker  corpus  and  detail  experimental  results,  trade  offs,  and 
performance. 


2.  Hashing  Techniques 

Hashing  is  a  common  technique  for  finding  nearest  neighbors 
of  a  given  vector,  x.  A  hashing  function  is  defined, 

h(x)  :  (1) 

where  B  =  {0,1}  and  I  is  the  number  of  bits.  The  hashing 
function  should  have  the  property  that  if  x  and  y  are  close,  their 
corresponding  hash  values  /i(x)  and  h{y)  have  close  Hamming 
distance. 

A  common  technique  used  in  hashing  is  to  define  m  ran¬ 
domly  selected  hashing  functions  hi  from  a  family  of  hash  func¬ 
tions  all  with  the  same  number  of  output  bits  1.  Given  a  set  of 
vectors  X  =  {x^},  z  =  1, . . . ,  n,  the  data  are  encoded  using  all 
of  the  hash  functions,  hi(x).  A  retrieval  function, 

Hi{h)  =  {j\hi{xj)  =  6} 

maps  bits  to  the  set  of  indices  of  the  data;  V{‘)  denotes  the 
power  set.  Multiple  functions  can  be  used  to  ensure  that  a  close 
neighbor  vector  will  be  eventually  retrieved.  The  retrieval  can 
be  implemented  with  0(1)  average  search  complexity  using  an 
inverted  index.  More  details  on  multiple  hash  functions  and 
implementation  can  be  found  in  [16]. 


2.1.  Locality  Sensitive  Hashing  (LSH) 

A  standard  baseline  method  for  vector-based  hashing  is  Local¬ 
ity  Sensitive  Hashing  (LSH)  [17,  18].  LSH  provides  probabilis¬ 
tic  bounds  for  near  items  having  the  same  hash  value.  A  typical 
method  for  implementing  LSH  for  vectors  is  to  use  random  pro¬ 
jection. 

In  more  detail,  assume  that  we  want  to  encode  a  unit  norm 
vector  X.  A  random  matrix.  Mi,  with  I  columns  of  dimen¬ 
sion  d  is  generated.  Then  hash  values  for  the  zth  hash  function, 
hi{),  are  generated  by. 


hi,j{x) 


1  if  M*  jX  >  0 
0  otherwise 


(3) 


where  j  indicates  the  jth  bit  of  the  output.  Intuitively,  each  bit 
represents  the  side  of  the  hyperplane  where  the  vector  lies. 


2.2.  Distance  Based  Hashing  (DBH) 

An  alternate  to  LSH  is  distance-based  hashing  (DBH).  Our  im¬ 
plementation  closely  follows  the  original  algorithm  [19]  with 
the  exception  of  compute  performance  optimizations.  Distance 
based  hashing  was  chosen  for  comparison  because  of  its  simi¬ 
larities  with  other  non-parametric  methods. 

As  with  non-parametric  hashing  presented  in  the  next  sec¬ 
tion,  distance-based  hashing  is  initialized  by  selecting  a  set  of 
21  seed  vectors  that  will  serve  as  a  basis  for  a  hashing  function 
to  encode  a  target  vector.  These  seed  vectors  should  be  chosen 
from  a  set  of  data  that  will  be  representative  of  the  target  data  to 
be  encoded. 

The  base  function  used  for  hashing  is  the  pseudo-line- 
projection  proposed  in  [19], 

J  _  d(x,Xl)2  +d(xi,X2)^  -d(x,X2)^ 

Uxi,X2V-^;  — 


Algorithm  1  Non-Parametric  Hashing  (NPH) 

Input:  Speaker  vector,  x 

Outputs:  Encoded  vector,  /iNPH,i(x),  of  I  bits  for  i  — 

For  each  hashing  function,  choose  Si  —  {si^i, . . .  ,Si^i} 
seed  speaker  vectors  from  an  initialization  data  set  and 
Ti  =  •  •  • ,  Ti^i}  tolerance  intervals  using  the  /c-nearest 

neighbors 

for  z  =  1, . . . ,  ?7z  do 
for  j  =  1 do 

if  c/(x,  Si,j)  <  Tij  then 

/iNPH,z,j(x)  =  1 

else 

/iNPH,z,j(x)  =  0 

end  if 
end  for 
end  for 


where  is  the  Euclidean  distance  between  two  vectors.  The 
vectors  x^  and  Xj  are  (fixed)  seed  vectors.  Encoding  of  an  input 
vector  is  accomplished  in  a  similar  manner  to  LSH  in  (3)  using 
the  pseudo-line-projection  (4), 


feBH,z,j(x) 


1  if  (^)  ^ 

0  otherwise 


Points  are  encoded  depending  on  if  they  fall  within  or  out¬ 
side  an  interval.  The  interval  is  defined  by  thresholds  and 
tj,2.  In  order  to  form  two  reference  points  for  the  pseudo-line- 
projection  equation,  the  21  initialization  vectors  are  randomly 
selected  to  form  I  pairs.  The  bits  of  the  hash  are  computed  by 
equation  (5)  over  the  21  pairs  forming  an  /-bit  hash. 


2.3.  Non-Par ametric  Hashing  (NPH) 

Non-parametric  pattern  recognition/clustering  methods  have 
the  advantage  of  being  an  unbiased  representation  of  the  data 
set  modeled.  The  disadvantage  of  these  techniques  is  that  they 
are  computationally  expensive.  The  construction  of  our  non- 
parametric  hashing  (NP-hashing)  leverages  the  computational 
speed  of  hashing  with  the  desirable  modeling  qualities  of  a  non- 
parametric  approach. 

The  basic  encoding  approach  is  shown  in  Algorithm  1 .  As 
with  distance-based  hashing,  the  first  step  of  NPH  is  the  selec¬ 
tion  of  a  set,  S,  of  I  seed  vectors.  Also,  the  seed  vectors  should 
be  chosen  from  a  set  of  vectors  that  are  closely  representative  of 
the  target  data.  Tolerance  intervals  or  hyperspheres  are  formed 
by  using  the  S  seed  vectors  as  a  set  of  centers.  The  set  of  tol¬ 
erance  intervals,  T,  is  then  formed  by  using  the  largest  distance 
from  the  k  nearest  neighbors  to  these  centers. 

The  input  speaker  vectors  are  encoded  into  a  /-bit  hash  by 
comparing  against  these  tolerance  intervals.  An  input  speaker 
is  compared  against  each  tolerance  interval.  If  the  input  falls 
with  the  tolerance  interval  it  is  coded  as  a  1.  If  it  lies  outside  the 
tolerance  intervals  it  is  coded  as  a  0.  In  this  manner,  the  /-bit 
hash  is  encoded. 


3.  Clustering  Techniques 

3.1.  Prior  Methods 

The  computation  and  storage  involved  in  clustering  is  a  major 
issue  for  large-scale  implementation.  The  two  basic  steps  in¬ 
volved  are — the  distance  matrix  computation  and  the  clustering 


Algorithm  2  Canopy  Construction  Algorithm 

Inputs:  Speaker  vectors,  {x^},  i  =  1, . . .  ,n;  hashing  re¬ 
trieval  methods,  —  1, . . . ,  m;  and  a  similarity  thresh¬ 

old  T, 

Outputs:  A  canopy  (set  of  sets),  C  —  {Ci, . . . ,  Cuc} 

Let  C  =  {},  X  =  {1, . . . ,  n},  and  z  =  1 
while  X  is  not  empty  do 
Pick  a  random  A  C  X 

Perform  retrievals,  Rn  =  XLjihjiyii. ))  for  7  =  1, . . . ,  m 
Ci  =  (ui?,  )  n  X;  add  a  to  C 

Calculate  a  similarity  score  for  each  p  in  Ci,  s(p)  = 

i\{j\peR,]\ 

Remove  all  p  from  X  with  s(p)  >  Ts 
z  =  z  +  1 

end  while 


standard  eonvention  of  zero. 

4.  Experiments 

4.1.  Experimental  Setup 

The  experimental  setup  was  to  perform  speaker  clustering  us¬ 
ing  data  from  the  NIST  Speaker  Reeognition  Evaluation  (SRE) 
years  2004,  2006,  2008,  2010  and  2012  [20].  I-veetors  were 
generated  with  our  standard  I-veetor  system  [21,  22]. 

The  data  was  subdivided  into  training  and  testing  partitions. 
Table  1  shows  the  partitioning  of  the  speeeh  eorpora  used  in  the 
experiments.  The  training  data  was  used  for  all  hyper-parameter 
training  of  the  I- vector  system  as  well  all  of  the  pre-trained  pa¬ 
rameters  of  the  elustering  and  hashing  functions. 


algorithm.  In  this  paper  we  address  the  former  and  consider  the 
latter  an  area  with  many  choices;  see,  for  example,  the  many 
graph-based  and  standard  methods  in  [13]. 

A  straightforward  approach  to  distance  matrix  computation 
is  to  compute  all  pairs  of  distances  d(x,y)  and  store  them  in  a 
matrix.  This  involves  O(n^)  storage  and  O(n^)  flops  where  n 
is  the  number  of  vectors  to  cluster.  These  resouree  requirements 
beeome  large  quickly;  for  lOOiT  vectors,  we  require  80  GBs  to 
store  the  distanee  matrix. 

A  step  in  the  right  direction  is  to  sparsify  the  distance  matrix 
by  seleeting  the  k  elosest  neighbors.  This  approaeh  was  used 
with  success  in  [10,  13].  This  results  in  a  matrix  with  storage 
0(n)  for  flxed  k,  but  eomputation  requirements  are  still  O(n^). 

To  reduce  the  computation  burden,  we  use  the  hashing  tech¬ 
niques  from  Seetion  2  to  limit  computation.  Speciflcally,  our 
approach  is  to  use  hashing  to  retrieve  a  eandidate  set  of  nearest 
neighbors  and  then  compute  the  distance  only  for  those  neigh¬ 
bors;  we  detail  this  eanopy  clustering  approach  more  in  the  next 
seetion. 


Table  1 :  Training  and  testing  partitions  of  the  speech  corpora 


Partition 

SRE 

#  of 

#  of 

Years 

Speeeh  Cuts 

Speakers 

Training 

2004,  2005,  2006 

17894 

2166 

Testing 

2008,  2010,  2012 

18250 

1835 

4.2.  Clustering  Experiments 

Clustering  and  hashing  both  require  the  setting  of  pre-trained 
parameters  such  as:  1)  the  number  of  hashes  used,  2)  the  num¬ 
ber  of  bits  used  in  the  hash  /,  3)  the  GAC  clustering  threshold, 
and  4)  the  similarity  threshold  Tg.  To  construct  a  fair  cluster¬ 
ing  experiment,  the  pre-trained  parameters  were  tuned  with  the 
training  partition  and  then  applied  to  the  testing  data. 

Two  metrics  were  used  to  assess  performance  of  the  clus¬ 
tering  experiments:  1)  adjusted  mutual  information  (AMI)  [23], 
and  2)  sparsity  of  the  distance  computations.  AMI  measures  the 
performanee  of  the  clustering  algorithm.  The  AMI  ealculation 
used  is. 


3.2.  Canopy  Clustering 

Canopy  clustering  consists  of  three  basic  parts.  Eirst,  we  con¬ 
struct  canopies,  C,  using  hashing  funetions.  Second,  distances 
are  eonstructed  based  on  the  canopies.  Third,  a  clustering  tech¬ 
nique  is  performed  using  the  eomputed  distanees. 

The  first  step,  eanopy  construction,  is  shown  in  Algo¬ 
rithm  2.  The  basic  flow  of  the  algorithm  is  to  piek  a  random 
member  of  the  vectors  to  cluster  and  retrieve  everything  close 
using  hashing.  This  proeess  is  repeated  until  the  entire  set  is 
eovered  by  the  resulting  canopy,  C,  which  is  a  set  of  sets. 

The  seeond  step  of  canopy  clustering  is  distanee  eomputa¬ 
tion.  Eor  each  canopy,  Ci,  all  the  of  distances  are  computed 
exhaustively  in  that  canopy — i.e.,  d(xi,Xj)  for  all  z,  j  in  Ci. 
Note  that  distinet  canopies  may  have  eommon  members  so  the 
resulting  distance  matrix  has  a  block  diagonal  component  with 
some  out-of-bloek  distances  eomputed  also. 

The  third  step  with  canopy  clustering  is  to  perform  clus¬ 
tering.  In  this  paper,  we  use  standard  greedy  agglomerative 
clustering  (GAC)  with  a  stopping  threshold.  Multiple  standard 
link  criteria  were  eonsidered  ineluding  minimum,  maximum, 
and  average.  Although  GAC  is  eomputationally  expensive,  it 
is  a  standard  well-performing  approach  that  serves  as  a  base¬ 
line.  Alternate  approaches  may  have  lower  eomputational  bur¬ 
den  [13].  Another  eomment  on  our  GAC  approaeh  is  that  the 
interpretation  of  sparsity  in  the  distanee  matrix  is  non-standard. 
If  a  distanee  in  D  is  not  specified,  it  is  assumed  to  be  00  not  the 


AMI(U,  V)  = 


MI(U,V)  -  E{MI(U,V)} 
max{H{U),H(V)}  -  E{MI{U,  V)} 


(6) 


where  MI{U,  V)  is  the  mutual  information  between  putative 
clustering  set  U  and  ground  truth  clustering  set  V.  H{U)  is 
defined  as  the  entropy  of  the  set  U.  Note  that  an  AMI  of  zero 
corresponds  to  chance. 

The  experiments  in  this  seetion  required  multiple  experi¬ 
ment  sweeping:  1)  number  of  hash  bits,  2)  number  of  hash  func¬ 
tions,  3)  GAC  threshold,  and  4)  similarity  elustering  threshold. 
All  were  conducted  by  taking  random  draws  of  1000  I- vectors 
and  then  conducting  clustering  experiments  over  the  1000  vec¬ 
tors.  The  results  were  then  ensemble  averaged. 

The  sparsity  of  distanee  computations  is  the  percentage  of 
distanees  not  eomputed  over  the  entire  set  to  be  elustered.  Most 
clustering  methods  require  full  matrix  of  distance  computations 
or  O(n^)  computations.  This  metric  evaluates  the  eomputa¬ 
tional  savings  of  our  proposed  clustering  algorithm. 

Eigure  1  plots  two  sets  of  clustering  results  using  the  base¬ 
line  LSH  function.  Similar  trends  are  seen  for  the  two  other 
hashing  methods,  DBH  and  NPH.  The  first  plot  is  of  adjusted 
mutual  information  versus  number  of  bits  with  a  varying  num¬ 
ber  of  hash  functions.  A  trend  can  be  seen  as  the  number  of  bits 
increases  for  the  hash — the  AMI  performanee  deereases  as  the 
number  of  bits  increases  for  a  single  hash  function.  This  prop¬ 
erty  is  due  to  the  fact  that  the  hash  becomes  too  speeifle  and 
the  input  points  only  hash  to  themselves  and  not  to  a  loeality  of 


Figure  1:  Plots  of  LSH  for  adjusted  mutual  information  and 
sparsity  versus  number  of  bits  for  various  number  hash  func¬ 
tions. 


Figure  2:  Plots  of  NPH  for  adjusted  mutual  information  and 
sparsity  versus  number  of  bits  for  various  number  hash  func¬ 
tions. 


points.  This  specificity  of  the  retrieval  can  be  controlled  by  us¬ 
ing  multiple  randomly  selected  hash  function  [16].  The  retrieval 
in  this  case  is  the  union  of  the  retrievals  (cf.,  Algorithm  2). 

The  second  plot  of  Figure  1  presents  the  sparsity  of  dis¬ 
tance  computations  versus  the  number  of  bits  in  the  hash.  This 
is  also  plotted  with  a  varying  number  of  total  hash  functions. 
As  the  number  of  bits  increases  the  sparsity  of  distance  com¬ 
putations  increases.  Again  this  is  due  to  the  fact  that  the  hash 
function  is  becoming  too  specific  with  the  increase  in  the  num¬ 
ber  of  bits.  The  canopy  clustering  algorithm  computes  distances 
over  smaller  canopies. 

Plots  of  AMI  and  sparsity  versus  number  of  bits  for  the  non- 
parametric  hashing  method  are  shown  in  Figure  2.  Comparing 
the  first  plot  of  Figure  2  with  the  AMI  versus  bits  of  Figure  1 , 
the  clustering  performance  drops  off  much  slower  with  the  in¬ 
crease  in  the  number  of  hashing  bits.  Comparing  sparsity  versus 
number  of  bits  in  Figures  1  and  2,  the  sparsity  of  distance  com¬ 
putations  increases  at  a  slower  rate  for  canopy  clustering  with 
non-parametric  hashing. 

AMI  and  sparsity  of  distance  computations  are  a  trade-off 
between  clustering  performance  and  computational  efficiency. 
This  trade-off  can  be  explored  directly  by  plotting  AMI  against 
sparsity.  Figure  3  shows  two  plots  for  LSH  for  AMI  versus  spar¬ 
sity  over  a  variety  of  number  of  hash  functions.  The  first  plot  is 
a  clustering  experiment  on  the  training  set  of  data  and  the  sec¬ 
ond  plot  is  on  the  testing  set.  Better  performing  systems  have 
curves  tending  more  the  upper  right  of  the  plot.  As  expected 
the  system  performs  better  on  the  training  set  of  data  since  the 
systems  I-vector  hyper-parameters  were  tuned  on  the  training 
data.  However  the  clustering  system  of  Figure  3  used  clustering 
parameters  greedy  agglomerative  clustering  stopping  threshold 
(GAC)  and  similarity  threshold,  that  were  set  to  some  reason¬ 
able  settings.  A  more  pragmatic  approach  would  be  to  set  the 
clustering  parameters  on  the  training  set  of  data  and  then  apply 
the  parameters  to  a  cluster  experiment. 


Figure  3:  Oracle  plots  for  adjusted  mutual  information  versus 
sparsity  for  LSH.  The  plots  are  results  of  the  training  set  and 
testing  set  over  a  varied  number  of  hash  functions 


Figure  4:  Plot  of  AMI  versus  sparsity  number  hash  functions=250 


Figure  4  presents  results  for  AMI  versus  sparsity  on  the  test 
data  set.  The  solid  lines  (oracle)  are  results  when  the  clustering 
experiment  used  hyper-parameters  trained  on  test.  The  dashed 
lines  (fair)  are  results  when  the  clustering  experiment  was  on 
the  testing  data  set  and  clustering  parameters  are  tuned  from 
the  training  data.  Figure  4  shows  that  the  performance  drops 
off  slightly  in  the  fair  experiments  but  the  parameter  tuning  is 
robust.  Additionally,  the  figure  shows  that  the  new  NPH  ap¬ 
proach  has  superior  performance  to  the  other  methods.  Finally, 
we  note  that  there  is  still  room  for  significant  improvement  at 
high  sparsity — further  improvements  in  hashing  are  possible. 

5.  Conclusions 

In  this  paper  we  have  introduced  a  new  locality  sensitive  hash¬ 
ing  technique,  non-parametric  hashing.  We  have  also  presented 
a  unique  method  of  speaker  clustering  using  canopy  cluster¬ 
ing.  When  combined  with  hashing,  these  methods  proved  to 
be  a  fast  and  effective  way  of  clustering  data.  The  trade-off 
between  computational  efficiency  and  clustering  performance 
was  explored  with  adjusted  mutual  information  versus  compu¬ 
tational  distance  sparsity  plots.  Future  work  will  explore  apply¬ 
ing  these  techniques  to  other  modalities  such  as  clustering  audio 
and  video  data. 

Since  the  non-parametric  hashing  was  constructed  with  de¬ 
fined  tolerance  intervals  or  hyperspheres,  we  conjecture  that  the 
non-parametric  hashing  method  should  be  a  non-biased  estima¬ 
tor  of  the  underlying  density  function  of  the  input  data.  Future 
work  will  endeavorer  to  prove  this  by  extending  the  proof  of  [8] 
to  non-parametric  hashing. 
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